-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/batch creation #2665
feat/batch creation #2665
Conversation
…/batch-creation
…into feat/batch-creation
…at/batch-creation
…to feat/batch-creation
…at/batch-creation
…into feat/batch-creation
this is now working, so I would appreciate some feedback on the design. The basic design is the same as what I outlined earlier in this PR: there are two new functions that take a approachbasically the same as concurrent group members listing, except we don't need any recursion. I'm scheduling writes and using new functions
Implicit groupsPartial hierarchies like streaming v2 vs v3 node creationcreating v3 arrays / groups requires writing 1 metadata document, but v2 requires 2. To get the most concurrency I await the write of each metadata document separately, which means that Overlap with metadata consolidation logicthere's a lot of similarity between the stuff in this PR and routines used for consolidated metadata. it would be great to find ways to factor out some of the overlap areas still to do:
|
that works for me. And what should the returned keys be for |
That sounds fine as it's clear that the |
…at/batch-creation
…at/batch-creation
in the interest of a narrow scope, I've limited the public api to just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. The public API create_hierarchy
looks nice to me.
test failure is unrelated to this PR (looks like an fsspec thing) |
This PR adds a few routines for creating a collection of arrays and groups (i.e., a dict with path-like keys and
ArrayMetadata
/GroupMetadata
values) in storage concurrently.create_hierarchy
takes a dict representation of a hierarchy, parses that dict to ensure that there are no implicit groups (creating group metadata documents as needed), then invokescreate_nodes
and yields the resultscreate_nodes
concurrently writes metadata documents to storage, and yields the createdAsyncArray
/AsyncGroup
instances.I still need to wire up concurrency limits, and test them.
TODO: